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ABSTRACT 

In  the  majority  of  digital  signal  processing  (DSP)  applications  the  critical  operations  are  the  multiplication  and 
accumulation.  Multiplier-Accumulator  (MAC)  unit  that  consumes  low  power  is  always  a key  to  achieve  a high 
performance  digital  signal  processing  system.  Finite  impulse  response  (FIR)  filters  are  widely  used  in  various  DSP 
applications.  The  purpose  of  this  work  is  to  design  and  implementation  of  Finite  impulse  response  (FIR)  filter  using  a low 
power  MAC  unit  with  clock  gating  and  pipelining  techniques  to  save  power. 
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INTRODUCTION 

Finite  impulse  response  (FIR)  filters  are  widely  used  in  various  DSP  applications.  This  paper  describes  an 
approach  to  the  implementation  of  low  power  digital  FIR  filter  based  on  field  programmable  gate  arrays  (FPGAs). 
The  advantages  of  the  FPGA  approach  to  digital  filter  implementation  include  higher  sampling  rates  than  are  available 
from  traditional  DSP  chips,  lower  costs  than  an  ASIC  for  moderate  volume  applications,  and  more  flexibility  than  the 
alternate  approaches.  Firstly,  a single  MAC  unit  is  designed,  with  appropriate  geometries  that  give  optimized  power,  area 
and  delay.  Similarly,  the  N no.  of  MAC  units  are  designed  and  controlled  for  low  power  using  a control  logic  that  enables 
the  each  stage  at  appropriate  time.  Multiply  -Accumulator  unit  has  become  one  of  the  essential  building  blocks  in  digital 
signal  processing  applications  such  as  digital  filtering,  speech  processing,  Video  coding  and  cellular  phone. 

Project  also  investigate  on  various  architectures  of  multipliers  and  adders  which  are  suitable  for  implementation  of 
high  throughput  signal  processing  and  at  the  same  time  to  achieve  low  power  consumption.  It  is  seen  by  above  results  that 
Fateh  based  design  can  reduce  the  dynamic  power  consumption  by  92%  and  pipelining  reduces  that  up  to  95%. 

MULTIPLY-ACCUMULATE  UNITS 

A variety  of  approaches  to  the  implementation  of  the  multiplication  and  addition  portions  of  the  MAC  function 
are  possible.  A conventional  MAC  unit  consists  of  multiplier  and  an  accumulator  that  contains  the  sum  of  the  previous 
consecutive  products.  The  structure  of  MAC  unit  is  illustrated  in  Figure  1.  It  consists  of  multiplying  2 values,  then  adding 
the  result  to  the  previously  accumulated  value,  which  must  then  be  Restored  in  the  registers  for  future  accumulations. 
The  function  of  the  MAC  unit  is  given  by  the  following  equation 

In  computing,  especially  digital  signal  processing,  the  multiply-accumulate  operation  is  a common  step  that 
computes  the  product  of  two  numbers  and  adds  that  product  to  an  accumulator.  The  hardware  unit  that  performs  the 
operation  is  known  as  a multiplier  accumulator  (MAC  or  MAC  unit);  the  operation  itself  is  also  often  called  a MAC  or  a 
MAC  operation.  The  MAC  operation  modifies  an  accumulator  a : 
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Modern  computers  may  contain  a dedicated  MAC,  consisting  of  a multiplier  implemented  in  combinational  logic 
followed  by  an  adder  and  an  accumulator  register  that  stores  the  result.  The  output  of  the  register  is  fed  back  to  one  input  of 
the  adder,  so  that  on  each  clock  cycle,  the  output  of  the  multiplier  is  added  to  the  register.  Combinational  multipliers 
require  a large  amount  of  logic,  but  can  compute  a product  much  more  quickly  than  the  method  of  shifting  and  adding 
typical  of  earlier  computers.  The  first  processors  to  be  equipped  with  MAC  units  were  digital  signal  processors,  but  the 
technique  is  now  also  common  in  general-purpose  processors. 

We  proposed  design  methodology  for  the  structure  of  MAC  unit  which  is  extended  to  handle  two’s  complement 
multiplication  in  Figure  1.  The  major  component  of  this  signed  MAC  unit  is  Sign  multiplier,  Sign  adder,  and  Multiplexer 
and  XOR  gate.  We  choose  12  bit  precision  input  bus  with  along  with  this  we  add  one  extra  sign  bit  so  in  total  at  input  side 
13  bit  is  applied  and  output  is  31  bit  precision. 


rN_  1 


Figure  1:  Basic  Structure  of  Signed  MAC  Unit 

6 TAP  FIR  FILTER 


Figure  2:  Basic  Structure  of  6 tap  FIR  Filter 


In  this  work  we  have  proposed  a design  of  6 tap  FIR  filter  as  shown  in  figure  2.  From  this  figure  the  input  is 
delayed  and  given  to  multiplier  each  multiplier  gives  products  corresponding  to  different  filter  coefficients  and  all  these 
products  are  accumulated  and  give  FIR  filter  output.  We  used  some  coefficient  from  matlab  and  suitably  convert  these 
values  into  binary  for  input  to  design  filter  else  we  can  give  any  coefficient  to  this  filter. 

POWER  CONSUMPTION 

A limiting  factor  in  many  modern  DSP  systems  is  the  power  consumption.  This  is  due  to  two  different  problems. 
First,  when  the  systems  becomes  larger  and  whole  systems  are  integrated  on  a single  chip,  i.e.,  System-on-Chip  (SoC),  and 
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the  clock  frequency  is  increased,  the  total  power  dissipation  is  approaching  the  limit  when  an  expensive  cooling  system  is 
required  to  avoid  overheated  chips.  Second,  the  portable  equipment  such  as  cellular  phones  and  portable  computers  are 
becoming  increasingly  popular.  These  products  use  batteries  as  their  power  supply.  A decrease  in  power  consumption 
increases  the  portability  since  smaller  batteries  can  be  used  with  longer  life-time  between  recharges.  Hence,  design  for  low 
power  consumption  is  important. 

DYNAMIC  POWER  DISSIPATIONS 

Dynamic  power  makes  up  a large  portion  of  the  total  amount  of  power  consumed  by  an  FPGA  design.  In  CMOS 
circuits,  the  dominant  source  of  power  dissipation  is  the  dynamic  power  dissipation.  Whenever  the  logic  level  changes  at 
different  points  in  the  circuit  because  of  the  change  in  the  input  signals  the  dynamic  power  dissipation  occurs.  Dynamic 
power  is  determined  by  the  following  equation. 

PD  = <*cy7f 

Where  alpha  is  the  switching  activity  factor,  C is  the  capacitance,  V is  the  supply  voltage,  and  f is  the  clock 
frequency.  In  addition  to  voltage  and  physical  capacitance,  switching  activity  also  influences  dynamic  power  consumption. 
A chip  may  contain  an  enormous  amount  of  physical  capacitance,  but  if  there  is  no  switching  in  the  circuit,  then  no 
dynamic  power  will  be  consumed.  The  data  activity  determines  how  often  this  switching  occurs. 

LOW-POWER  DESIGNS 

Design  for  low  power  has  become  increasingly  important  in  a wide  variety  of  applications,  including  digital  signal 
processing,  mobile  computing,  high  performance  computing,  and  high-speed  networking.  The  power  reduction  is  achieved 
through  the  usage  of  a MAC  unit  inside  the  filters  that  reduce  the  total  activity  and  therefore  the  dynamic  power.  Above 
equation  shows  that  the  dynamic  power  consumption  is  proportional  to  switching  activity.  Therefore,  minimizing 
switching  activity  can  effectively  reduce  the  power  dissipation  without  impacting  the  circuit  performance. 

The  activity  can  be  reduced  with  different  methods  and  at  different  levels. 

CLOCK  GATING 

Low-power  techniques  are  essential  in  modern  VLSI  design  due  to  the  continuous  increase  of  clock  frequency  and 
chip  complexity.  Various  recently  proposed  techniques  yield  low  power  operation  reducing  signals  switching  activity. 
Such  techniques  are  generally  applied  to  internal  nodes  with  high  capacitive  load  that  heavily  contribute  to  total  power 
dissipation.  In  particular,  the  clock  system,  composed  by  flip-flops  and  clock  distribution  network,  is  one  of  the  most 
power  consuming  subsystems  in  a VLSI  circuit.  As  a consequence  many  techniques  have  been  proposed  to  reduce  clock 
system  power  dissipation 

LATCH-BASED  DESIGN 

In  some  applications,  latch-based  designs  are  preferred  to  D Flip  Flop  (DFF)-based  designs.  The  basic  concept  is 
that  a DFF  can  be  split  into  two  latches,  and  each  one  is  clocked  with  an  independent  clock  signal.  [14] 

The  two  clocks  are  nonoverlapping  clocks  as  presented  in  Figure  3.  Combinational  network  is  usually  inserted 
between  the  two  latches  to  build  a pipelined  data  path.  The  main  advantage  is  that  this  kind  of  design  supports  greater 
clock  skew  before  failing  than  a similar  DFF-based  design.  The  second  advantage  is  that  time  borrowing  is  achieved 
naturally  in  the  pipelined  data  path. 
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CLOCK  GATING  OF  LATCH  BASED  DESIGN 

Latch-based  designs  provide  several  advantages  over  single  clock  master-slave  Flip-Flop  designs.  The  constraint 
with  respect  to  the  clock  skew  can  be  relaxed  for  both  the  clkl  and  clk2  clock  trees.  This  allows  the  synthesizer  and  router 
to  use  smaller  clock  buffers  and  to  simplify  the  clock  tree  generation,  which  will  reduce  the  power  consumption  of  the 
clock  tree. 


clkl 

Figure  3:  Clock  Gating  of  Latched  Based  Design 

PIPELINING 

While  FPGAs  provide  flexibility  for  performing  high-performance  DSP  functions,  they  consume  a significant 
amount  of  power.  For  arithmetic  circuits,  a large  portion  of  the  dynamic  power  is  wasted  on  un-productive  signal  glitches. 
Pipelining  can  be  used  to  signicantly  reduce  the  unproductive  power  wasted  in  signal  glitches.  Previous  studies  have  shown 
that  power  dissipation  caused  by  glitching  can  make  up  a significant  amount  of  total  dissipated  power.  An  important 
technique  for  reducing  FPGA  power  consumption  is  to  reduce  the  amount  of  signal  glitching  within  the  circuit.  Pipelining 
is  one  technique  for  reducing  signal  glitches.  Previous  studies  have  shown  that  pipelining  can  be  used  to  reduce  power  by 
90%.  A pipelined  design  has  less  logic  between  registers  and  therefore  is  less  prone  to  glitching. 

ACTIVE  HDL  SCHEMATIC 


Figure  4:  Active  HDL  Simulation  of  6 Tap  Sequential  FIR  Digital  Filter 
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Figure  5:  Active  HDL  Simulation  of  6 Tap  Latch  based  FIR  Digital  Filter 
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Figure  6:  Active  HDL  Simulation  of  6 Tap  Pipelined  FIR  Digital  Filter 
SIMULATION  AND  RESULTS 


A FIR  filter  scheme  suitable  for  unsigned  and  signed  computations  is  presented  in  this  paper.  Low  power  designs 
for  6 Tap  FIR  filter  using  latch  based  and  pipelining  techniques  are  implemented.  These  filters  are  designed  using 
MATLAB  and  developed  VHDL  code.  Simulation  is  performed  using  Active-HDL  and  functional  verification  is  carried 
out  using  Altera  Quartus  II  and,  FPGA  implementation  on  Cyclone.  Figure  9 shows  simulation  result  performed  in 
Active-HDL  for  6 tap  FIR  filter.  Simulation  result  of  latch  based  FIR  filter  is  shown  in  figure  8 Whereas  figure  6 shows 
simulation  result  of  pipelined  FIR  filter  in  Active  HDL.  Using  finite  state  machine  input  values  and  filter  coefficients  have 
given  to  this  digital  FIR  filter. 


Figure  7:  Simulation  Results  for  Sequential  6 Tap  FIR  Filter 

| todOl  91  ytfjiat  rife  ml)  - Cteahn' fife,  n&ourtte* 

Fk  Eik  Sndi  i»  tatspc:  jap  Smiabon  iubm  look  jta  Hi 


l'#fl  sSI  iOPSHirflCI  0 F U u»fisjj«  I < ► hfo  teanto 


»*  0 

Hit  D 

e«R  nom  | 

■■■HIM 

j mk  j[  mm  ] wmb  | me  | ora  J inw  | kmm 

Figure  8:  Simulation  Results  for  Latch  Based  6 Tap  FIR  Filter 
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Figure  9:  Simulation  Results  for  6 Tap  Pipelined  FIR  Filter 
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Power  Analyzer  Summary 


Table  1:  Power  Analyzer  Summary 


6 Tap  FIR  Filter 

Core  Dynamic  Thermal 
Power  Dissipation 

sequential  Filter 

21.40mW 

Latch  Based  Filter 

1.73mW 

Pipelined  Filter 

1.09mW 

It  can  be  seen  that  dynamic  power  consumption  is  decreased  through  the  use  of  two  techniques;  latch  based  clock 
gating  and  pipelining  of  original  6 tap  FIR  filter  as  shown  in  table  1 . The  proposed  FIR  filters  have  been  synthesized  and 
implemented  using  Altera  Quartus  II  FPGA  and  power  is  analyzed  using  Power  Play  Power  Analyzer  Tool. 

CONCLUSIONS 

While  FPGAs  provide  flexibility  for  performing  high  performance  DSP  functions,  they  consume  a significant 
amount  of  power.  Often,  a large  portion  of  the  dynamic  power  is  wasted  on  unproductive  signal  glitches. 
Reducing  glitching  reduces  dynamic  energy  consumption.  Design  for  low  power  6 tap  FIR  filter  has  been  presented  in  this 
work.  The  power  reduction  is  achieved  through  the  usage  of  a MAC  unit  inside  the  filters  that  reduce  the  total  activity  and 
therefore  the  dynamic  power.  The  basic  building  blocks  for  the  MAC  unit  are  identified  and  each  of  the  blocks  is  analyzed 
for  its  performance.  Power  is  calculated  for  the  blocks.  6 tap  digital  FIR  filter  has  designed  with  enable  to  reduce  the  total 
power  consumption  based  on  pipelining  and  latch  based  clock  gating  techniques. 

Active-HDL  together  with  Altera  Quartus  II  tool  is  used  effectively  to  model  dynamic  transient  signal  activity  and 
produce  accurate  power  consumption  estimation.  In  this  work  original  FIR  filter,  latch  based  filter  and  pipelined  filter  are 
implemented  in  cyclone  EP1C6Q240C7  FPGA.  It  is  seen  by  above  results  that  Latch  based  design  can  reduce  the  dynamic 
power  consumption  by  92%  and  pipelining  reduces  that  up  to  95%. 
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